Overview

Dataset statistics

Number of variables14
Number of observations70995
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.6 MiB
Average record size in memory112.0 B

Variable types

NUM12
CAT2

Warnings

odometer is highly skewed (γ1 = 35.00114843) Skewed
df_index has unique values Unique
manufacturer has 747 (1.1%) zeros Zeros
condition has 35365 (49.8%) zeros Zeros
fuel has 3457 (4.9%) zeros Zeros
type has 19443 (27.4%) zeros Zeros
paint_color has 13268 (18.7%) zeros Zeros

Reproduction

Analysis started2020-10-10 16:21:53.583697
Analysis finished2020-10-10 16:22:31.400535
Duration37.82 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct70995
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean42740.91792
Minimum0
Maximum85319
Zeros1
Zeros (%)< 0.1%
Memory size554.6 KiB

Quantile statistics

Minimum0
5-th percentile4184.7
Q121471.5
median42640
Q364067.5
95-th percentile81184.3
Maximum85319
Range85319
Interquartile range (IQR)42596

Descriptive statistics

Standard deviation24679.43159
Coefficient of variation (CV)0.577419316
Kurtosis-1.19866326
Mean42740.91792
Median Absolute Deviation (MAD)21302
Skewness-0.001148208366
Sum3034391468
Variance609074343.7
MonotocityStrictly increasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
20471< 0.1%
 
539031< 0.1%
 
579931< 0.1%
 
641381< 0.1%
 
620911< 0.1%
 
518521< 0.1%
 
498051< 0.1%
 
559501< 0.1%
 
149941< 0.1%
 
293391< 0.1%
 
Other values (70985)70985> 99.9%
 
ValueCountFrequency (%) 
01< 0.1%
 
11< 0.1%
 
21< 0.1%
 
31< 0.1%
 
41< 0.1%
 
ValueCountFrequency (%) 
853191< 0.1%
 
853181< 0.1%
 
853171< 0.1%
 
853151< 0.1%
 
853141< 0.1%
 

price
Real number (ℝ≥0)

Distinct3795
Distinct (%)5.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12367.47668
Minimum1050
Maximum39999
Zeros0
Zeros (%)0.0%
Memory size554.6 KiB

Quantile statistics

Minimum1050
5-th percentile2895
Q15990
median9950
Q316578
95-th percentile30910.2
Maximum39999
Range38949
Interquartile range (IQR)10588

Descriptive statistics

Standard deviation8575.313181
Coefficient of variation (CV)0.693376135
Kurtosis0.8157469837
Mean12367.47668
Median Absolute Deviation (MAD)4950
Skewness1.165281363
Sum878029007
Variance73535996.16
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
69959121.3%
 
79958921.3%
 
89958841.2%
 
59958291.2%
 
45007791.1%
 
35007681.1%
 
99957541.1%
 
55007451.0%
 
65007381.0%
 
49956921.0%
 
Other values (3785)6300288.7%
 
ValueCountFrequency (%) 
10504< 0.1%
 
10951< 0.1%
 
10991< 0.1%
 
110021< 0.1%
 
11112< 0.1%
 
ValueCountFrequency (%) 
3999917< 0.1%
 
399983< 0.1%
 
3999719< 0.1%
 
39995580.1%
 
399941< 0.1%
 

year
Real number (ℝ≥0)

Distinct21
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2010.758687
Minimum2001
Maximum2021
Zeros0
Zeros (%)0.0%
Memory size554.6 KiB

Quantile statistics

Minimum2001
5-th percentile2003
Q12007
median2011
Q32014
95-th percentile2018
Maximum2021
Range20
Interquartile range (IQR)7

Descriptive statistics

Standard deviation4.695733948
Coefficient of variation (CV)0.002335304568
Kurtosis-0.9003727376
Mean2010.758687
Median Absolute Deviation (MAD)4
Skewness-0.1536677832
Sum142753813
Variance22.04991731
MonotocityNot monotonic
Histogram with fixed size bins (bins=21)
ValueCountFrequency (%) 
201355127.8%
 
201152937.5%
 
201251357.2%
 
201450147.1%
 
200849857.0%
 
201547336.7%
 
200744366.2%
 
201743346.1%
 
201042195.9%
 
201639755.6%
 
Other values (11)2335932.9%
 
ValueCountFrequency (%) 
200112901.8%
 
200217262.4%
 
200321543.0%
 
200428324.0%
 
200532494.6%
 
ValueCountFrequency (%) 
20216< 0.1%
 
20204070.6%
 
201918842.7%
 
201823833.4%
 
201743346.1%
 

manufacturer
Real number (ℝ≥0)

ZEROS

Distinct39
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean18.9175294
Minimum0
Maximum41
Zeros747
Zeros (%)1.1%
Memory size554.6 KiB

Quantile statistics

Minimum0
5-th percentile5
Q110
median16
Q331
95-th percentile39
Maximum41
Range41
Interquartile range (IQR)21

Descriptive statistics

Standard deviation11.56409725
Coefficient of variation (CV)0.6112900369
Kurtosis-0.9902455298
Mean18.9175294
Median Absolute Deviation (MAD)9
Skewness0.5631120537
Sum1343050
Variance133.7283451
MonotocityNot monotonic
Histogram with fixed size bins (bins=39)
ValueCountFrequency (%) 
131277818.0%
 
71010514.2%
 
3960858.6%
 
1648456.8%
 
3143376.1%
 
2030954.4%
 
1027873.9%
 
1425633.6%
 
3721403.0%
 
1721283.0%
 
Other values (29)2013228.4%
 
ValueCountFrequency (%) 
07471.1%
 
16< 0.1%
 
23< 0.1%
 
37841.1%
 
419082.7%
 
ValueCountFrequency (%) 
415580.8%
 
4018142.6%
 
3960858.6%
 
388< 0.1%
 
3721403.0%
 

model
Real number (ℝ≥0)

Distinct8075
Distinct (%)11.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5119.081414
Minimum0
Maximum10038
Zeros1
Zeros (%)< 0.1%
Memory size554.6 KiB

Quantile statistics

Minimum0
5-th percentile664
Q12792
median5133
Q37623
95-th percentile9452
Maximum10038
Range10038
Interquartile range (IQR)4831

Descriptive statistics

Standard deviation2765.129139
Coefficient of variation (CV)0.5401611961
Kurtosis-1.124779596
Mean5119.081414
Median Absolute Deviation (MAD)2361
Skewness-0.02279244518
Sum363429185
Variance7645939.158
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
393610441.5%
 
80659531.3%
 
36346200.9%
 
12735710.8%
 
20105610.8%
 
565560.8%
 
11635340.8%
 
22954800.7%
 
37854720.7%
 
46514370.6%
 
Other values (8065)6476791.2%
 
ValueCountFrequency (%) 
01< 0.1%
 
11< 0.1%
 
31< 0.1%
 
51< 0.1%
 
61< 0.1%
 
ValueCountFrequency (%) 
100381< 0.1%
 
100371< 0.1%
 
100341< 0.1%
 
100336< 0.1%
 
100321< 0.1%
 

condition
Real number (ℝ≥0)

ZEROS

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.101246567
Minimum0
Maximum5
Zeros35365
Zeros (%)49.8%
Memory size554.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32
95-th percentile3
Maximum5
Range5
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.159579255
Coefficient of variation (CV)1.052969689
Kurtosis-1.363802462
Mean1.101246567
Median Absolute Deviation (MAD)1
Skewness0.3314757873
Sum78183
Variance1.344624048
MonotocityNot monotonic
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%) 
03536549.8%
 
22576836.3%
 
3784811.1%
 
116842.4%
 
42310.3%
 
5990.1%
 
ValueCountFrequency (%) 
03536549.8%
 
116842.4%
 
22576836.3%
 
3784811.1%
 
42310.3%
 
ValueCountFrequency (%) 
5990.1%
 
42310.3%
 
3784811.1%
 
22576836.3%
 
116842.4%
 

cylinders
Real number (ℝ≥0)

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.436227903
Minimum0
Maximum7
Zeros305
Zeros (%)0.4%
Memory size554.6 KiB

Quantile statistics

Minimum0
5-th percentile3
Q13
median5
Q35
95-th percentile6
Maximum7
Range7
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.256988924
Coefficient of variation (CV)0.2833463364
Kurtosis-1.04221849
Mean4.436227903
Median Absolute Deviation (MAD)1
Skewness-0.2797058224
Sum314950
Variance1.580021154
MonotocityNot monotonic
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%) 
32698738.0%
 
52629937.0%
 
61637723.1%
 
47301.0%
 
03050.4%
 
71470.2%
 
21330.2%
 
117< 0.1%
 
ValueCountFrequency (%) 
03050.4%
 
117< 0.1%
 
21330.2%
 
32698738.0%
 
47301.0%
 
ValueCountFrequency (%) 
71470.2%
 
61637723.1%
 
52629937.0%
 
47301.0%
 
32698738.0%
 

fuel
Real number (ℝ≥0)

ZEROS

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.934023523
Minimum0
Maximum4
Zeros3457
Zeros (%)4.9%
Memory size554.6 KiB

Quantile statistics

Minimum0
5-th percentile1
Q12
median2
Q32
95-th percentile2
Maximum4
Range4
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.4923988553
Coefficient of variation (CV)0.2545981729
Kurtosis11.70410423
Mean1.934023523
Median Absolute Deviation (MAD)0
Skewness-2.14675949
Sum137306
Variance0.2424566327
MonotocityNot monotonic
Histogram with fixed size bins (bins=5)
ValueCountFrequency (%) 
26574692.6%
 
034574.9%
 
310601.5%
 
46340.9%
 
1980.1%
 
ValueCountFrequency (%) 
034574.9%
 
1980.1%
 
26574692.6%
 
310601.5%
 
46340.9%
 
ValueCountFrequency (%) 
46340.9%
 
310601.5%
 
26574692.6%
 
1980.1%
 
034574.9%
 

odometer
Real number (ℝ≥0)

SKEWED

Distinct28733
Distinct (%)40.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean114037.944
Minimum0
Maximum9400000
Zeros208
Zeros (%)0.3%
Memory size554.6 KiB

Quantile statistics

Minimum0
5-th percentile17581
Q168690
median109567
Q3150000
95-th percentile212907.5
Maximum9400000
Range9400000
Interquartile range (IQR)81310

Descriptive statistics

Standard deviation103329.2903
Coefficient of variation (CV)0.9060956965
Kurtosis2522.564326
Mean114037.944
Median Absolute Deviation (MAD)40695
Skewness35.00114843
Sum8096123835
Variance1.067694223e+10
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1400002750.4%
 
1500002690.4%
 
1300002680.4%
 
1200002470.3%
 
1700002420.3%
 
1600002320.3%
 
1800002220.3%
 
02080.3%
 
1450002000.3%
 
1350001990.3%
 
Other values (28723)6863396.7%
 
ValueCountFrequency (%) 
02080.3%
 
1590.1%
 
21< 0.1%
 
32< 0.1%
 
41< 0.1%
 
ValueCountFrequency (%) 
94000001< 0.1%
 
88888881< 0.1%
 
74656441< 0.1%
 
74411041< 0.1%
 
49000001< 0.1%
 

transmission
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size554.6 KiB
0
64854 
1
 
4292
2
 
1849
ValueCountFrequency (%) 
06485491.4%
 
142926.0%
 
218492.6%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length1
Median length1
Mean length1
Min length1

drive
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size554.6 KiB
1
29964 
0
28331 
2
12700 
ValueCountFrequency (%) 
12996442.2%
 
02833139.9%
 
21270017.9%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length1
Median length1
Mean length1
Min length1

type
Real number (ℝ≥0)

ZEROS

Distinct13
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.907303331
Minimum0
Maximum12
Zeros19443
Zeros (%)27.4%
Memory size554.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median8
Q39
95-th percentile11
Maximum12
Range12
Interquartile range (IQR)9

Descriptive statistics

Standard deviation4.235857361
Coefficient of variation (CV)0.7170543179
Kurtosis-1.515071859
Mean5.907303331
Median Absolute Deviation (MAD)2
Skewness-0.3816937941
Sum419389
Variance17.94248758
MonotocityNot monotonic
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%) 
92110629.7%
 
01944327.4%
 
10827211.7%
 
851867.3%
 
336565.1%
 
432774.6%
 
1127873.9%
 
1224313.4%
 
523393.3%
 
214662.1%
 
Other values (3)10321.5%
 
ValueCountFrequency (%) 
01944327.4%
 
1640.1%
 
214662.1%
 
336565.1%
 
432774.6%
 
ValueCountFrequency (%) 
1224313.4%
 
1127873.9%
 
10827211.7%
 
92110629.7%
 
851867.3%
 

paint_color
Real number (ℝ≥0)

ZEROS

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.653623495
Minimum0
Maximum11
Zeros13268
Zeros (%)18.7%
Memory size554.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median8
Q39
95-th percentile10
Maximum11
Range11
Interquartile range (IQR)8

Descriptive statistics

Standard deviation3.991704175
Coefficient of variation (CV)0.7060435097
Kurtosis-1.580678747
Mean5.653623495
Median Absolute Deviation (MAD)2
Skewness-0.2968371636
Sum401379
Variance15.93370222
MonotocityNot monotonic
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%) 
101677223.6%
 
01326818.7%
 
91146216.1%
 
5874712.3%
 
1771710.9%
 
869259.8%
 
218282.6%
 
417002.4%
 
315552.2%
 
114370.6%
 
Other values (2)5840.8%
 
ValueCountFrequency (%) 
01326818.7%
 
1771710.9%
 
218282.6%
 
315552.2%
 
417002.4%
 
ValueCountFrequency (%) 
114370.6%
 
101677223.6%
 
91146216.1%
 
869259.8%
 
72170.3%
 

state
Real number (ℝ≥0)

Distinct51
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean24.7617297
Minimum0
Maximum50
Zeros656
Zeros (%)0.9%
Memory size554.6 KiB

Quantile statistics

Minimum0
5-th percentile4
Q112
median24
Q337
95-th percentile48
Maximum50
Range50
Interquartile range (IQR)25

Descriptive statistics

Standard deviation14.62857776
Coefficient of variation (CV)0.590773663
Kurtosis-1.319492316
Mean24.7617297
Median Absolute Deviation (MAD)13
Skewness-0.03124648915
Sum1757959
Variance213.9952872
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
451887.3%
 
949757.0%
 
3443166.1%
 
3534444.9%
 
4832334.6%
 
2232094.5%
 
4328844.1%
 
3825853.6%
 
2725143.5%
 
4524433.4%
 
Other values (41)3620451.0%
 
ValueCountFrequency (%) 
06560.9%
 
110831.5%
 
25230.7%
 
38941.3%
 
451887.3%
 
ValueCountFrequency (%) 
502060.3%
 
491120.2%
 
4832334.6%
 
477111.0%
 
469741.4%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

df_indexpriceyearmanufacturermodelconditioncylindersfuelodometertransmissiondrivetypepaint_colorstate
0016995200714802626025421700101023
111399520121339362521884060010523
227995201073552232108124000523
3389952011791972521780540001023
441099520141337852521702590001023
55129952004342722503096210010323
66109952011780652622108650010923
77124502011780652621509590010123
881599520117813626222347000101023
99119952009780652621706840010123

Last rows

df_indexpriceyearmanufacturermodelconditioncylindersfuelodometertransmissiondrivetypepaint_colorstate
7098585308145832017315669032179351101023
70986853098000200845902521006850091023
709878531112000201339701123369880014123
7098885312570020141734294321190000191032
709898531329500201539874035275000008134
7099085314160020044197960422922550012134
7099185315988520123747330328200000496
70992853174800200213643925258000023132
709938531816002006178316152159980019123
709948531990002003397841062160000000423